{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Ungraded Lab:  Multiclass Classification: One-vs-all\n",
    "\n",
    "One vs All is one method for selection when there are more than two categories.   \n",
    "![pic](./figures/onevsmany.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Goals\n",
    "In this lab you will:\n",
    "- utilize the functions you have developed (compute_cost, compute_gradient, predict, gradient_descent) to do binomial (yes/no) classification\n",
    "- write a multi-class prediction routine to select between n binomial decisions\n",
    "- utilize decision boundary plotting logic\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Outline\n",
    "- [Tools](#tools)\n",
    "- [Dataset](#dataset)\n",
    "- [One vs All Implementation](#ova)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Multiclass Classification: One-vs-all (OVA)\n",
    "In this lab, we will explore how to use the One-vs-All method for classification when there are more than two categories. This technique is an extention of two class or binomial logistic regression that we have working with. \n",
    "\n",
    "In binomail logistic regression, we train a model to classify samples that are in a class or not in a class. One-vs-All(OVA) extends this method by training $n$ models. Each model is responsible for identifying one class. A model for a given class is trained by recasting the training set to identify one class as positive and all the rest as negative. To make predictions, an example is processed by all $n$ models and the model with the largest prediction output is selected.\n",
    "\n",
    "In this lab, we will build an OVA classifier.\n",
    "## Tools \n",
    "- We will utilize our previous work to build and train models. These routines are provided. \n",
    "- Plotting decision boundaries and datasets is helpful. Producing these plots is quite involved so helper routines are provided below.\n",
    "        - plot_mc_decision_boundary() will use a prediction routine you will write in this assigment, `predict_mc`\n",
    "- We will create a multi-class data set. Popular [`SkLearn`](https://scikit-learn.org/stable/) routines are utilized."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from lab_utils import *\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from sklearn.datasets import make_blobs\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "import copy\n",
    "import math"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These routines are provided but reviewing their operation is instructive. Plotting routines often make use of many esoteric but useful numpy routines. Plotting decision boundaries makes use of `matplotlib's` contour plot. A contour plot draws a line at boundary of a change in values. That capability is used to delineate changes in decisions. Briefly, the routine has 3 steps:\n",
    "- create a fine mesh of locations in a 2-D grid. Build an array of those points.\n",
    "- make predictions for each of those points. In this case, this includes the vote for the best prediction.\n",
    "- plot the mesh vs the predictions(`Z`) using a contour plot."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Plot a multi-class decision boundary\n",
    "def plot_mc_decision_boundary(X,nclasses, W, b , predict_mc_function, class_labels=None, legend=False):\n",
    "\n",
    "    # create a mesh to points to plot\n",
    "    h = 0.1  # step size in the mesh\n",
    "    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1\n",
    "    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1\n",
    "    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),\n",
    "                         np.arange(y_min, y_max, h))\n",
    "    points = np.c_[xx.ravel(), yy.ravel()]\n",
    "\n",
    "    #make predictions for each point in mesh\n",
    "    Z = predict_mc_function(points,W,b)\n",
    "    Z = Z.reshape(xx.shape)\n",
    "\n",
    "    #contour plot highlights boundaries between values - classes in this case\n",
    "    plt.figure()\n",
    "    plt.contour(xx, yy, Z, colors='g') \n",
    "    plt.axis('tight')\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot  multi-class training points\n",
    "def plot_mc_data(X, y, class_labels=None, legend=False):\n",
    "    classes = np.unique(y)\n",
    "    for i in classes:\n",
    "        label = class_labels[i] if class_labels else \"class {}\".format(i)\n",
    "        idx = np.where(y == i)\n",
    "        plt.scatter(X[idx, 0], X[idx, 1],  cmap=plt.cm.Paired,\n",
    "                    edgecolor='black', s=20, label=label)\n",
    "    if legend: plt.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We're providing the routines which you have developed in previous labs to create and fit/train a model. Feel free to replace these with your own versions. (Keep a copy of the original just in case.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, predict_function, alpha, num_iters): \n",
    "    \"\"\"\n",
    "    Performs batch gradient descent to learn theta. Updates theta by taking \n",
    "    num_iters gradient steps with learning rate alpha\n",
    "    \n",
    "    Args:\n",
    "      X :    (array_like Shape (m,n)\n",
    "      y :    (array_like Shape (m,1))\n",
    "      w_in : (array_like Shape (n,1)) Initial values of parameters of the model\n",
    "      b_in : (scalar)                 Initial value of parameter of the model\n",
    "      cost_function:                  function to compute cost\n",
    "      gradient_function:              function to compute the gradient\n",
    "      predict_function                function to compute the output of the model\n",
    "      alpha : (float)                 Learning rate\n",
    "      num_iters : (int)               number of iterations to run gradient descent\n",
    "    Returns\n",
    "      w : (array_like Shape (n,)) Updated values of parameters of the model after\n",
    "          running gradient descent\n",
    "      b : (scalar)                Updated value of parameter of the model after\n",
    "          running gradient descent\n",
    "    \"\"\"\n",
    "    \n",
    "    # number of training examples\n",
    "    m = len(X)\n",
    "    \n",
    "    # An array to store cost J and w's at each iteration primarily for graphing later\n",
    "    J_history = []\n",
    "    w_history = []\n",
    "    w = copy.deepcopy(w_in)  #avoid modifying global w within function\n",
    "    b = b_in\n",
    "    \n",
    "    for i in range(num_iters):\n",
    "\n",
    "        # Calculate the gradient and update the parameters\n",
    "        dJdb,dJdw = gradient_function(X, y, w, b, predict_function)   ##None\n",
    "\n",
    "        # Update Parameters using w, b, alpha and gradient\n",
    "        w = w - alpha * dJdw               ##None\n",
    "        b = b - alpha * dJdb               ##None\n",
    "\n",
    "        # Save cost J at each iteration\n",
    "        if i<100000:      # prevent resource exhaustion \n",
    "            cost =  cost_function(X, y, w, b)\n",
    "            J_history.append(cost)\n",
    "\n",
    "        # Print cost every at intervals 10 times or as many iterations if < 10\n",
    "        if i% math.ceil(num_iters/10) == 0:\n",
    "            w_history.append(w)\n",
    "            print(f\"Iteration {i:4}: Cost {float(J_history[-1]):8.2f}   \")\n",
    "    #print(w,b,len(J_history), len(w_history) )   \n",
    "    return w, b, J_history, w_history #return w and J,w history for graphing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def compute_cost_logistic_matrix(X, y, w, b):\n",
    "    \"\"\"\n",
    "    Computes the cost over all examples\n",
    "    Args:\n",
    "      X : (array_like Shape (m,n)) data, m examples by n features\n",
    "      y : (array_like Shape (m,1)) target value \n",
    "      w : (array_like Shape (n,1)) Values of parameters of the model      \n",
    "      b : (array_like Shape (n,1)) Values of bias parameter of the model\n",
    "    Returns:\n",
    "      total_cost: (scalar)         cost \n",
    "    \"\"\"\n",
    "    m = X.shape[0]\n",
    "    \n",
    "    f = sigmoid(X @ w + b)\n",
    "    total_cost = (1/m)*(np.dot(-y.T, np.log(f)) - np.dot((1-y).T, np.log(1-f)))\n",
    "    \n",
    "    return total_cost[0,0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def predict_logistic_matrix(X, w, b): \n",
    "    \"\"\"\n",
    "    single predict using linear regression\n",
    "    Args:\n",
    "      x : (array_like Shape (m,n)) feature values house size, bedrooms..\n",
    "      w : (array_like Shape (n,)) parameters for prediction   \n",
    "      b : (scalar               ) parameter  for prediction   \n",
    "    Returns\n",
    "      p: ((array_like Shape (n,)  predictions\n",
    "    \"\"\"\n",
    "    p = sigmoid(X @ w + b )         \n",
    "    \n",
    "    return(p)    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def predict_thresh(X, w, b, threshold=0.5): \n",
    "    \"\"\"\n",
    "    Predict whether the label is 0 or 1 using learned logistic\n",
    "    regression parameters w, b. Threshold the output of the sigmoid to predict 1 or 0.\n",
    "    \n",
    "    Parameters\n",
    "    ----------\n",
    "    X : array_like\n",
    "        Shape (m, n) \n",
    "    \n",
    "    w : array_like\n",
    "        Parameters of the model\n",
    "        Shape (n, 1)\n",
    "    b : scalar\n",
    "    \n",
    "    Returns\n",
    "    -------\n",
    "\n",
    "    p: array_like\n",
    "        Shape (m,)\n",
    "        The predictions for X using a threshold at 0.5\n",
    "    \"\"\"\n",
    "    # number of training examples\n",
    "    m = X.shape[0]   \n",
    "    p = np.zeros(m)\n",
    "   \n",
    "    for i in range(m):\n",
    "        f_w = sigmoid(np.dot(w.T, X[i])+b)\n",
    "        p[i] = f_w >= threshold\n",
    "    \n",
    "    return p"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def compute_gradient_logistic_matrix(X, y, w, b, predict_function): \n",
    "    \"\"\"\n",
    "    Computes the gradient for linear regression \n",
    " \n",
    "    Args:\n",
    "      X : (array_like Shape (m,n)) variable such as house size \n",
    "      y : (array_like Shape (m,1)) actual value \n",
    "      w : (array_like Shape (n,1)) Values of parameters of the model      \n",
    "      b : (scalar )                Values of parameter of the model      \n",
    "      predict_function: (function) function to call to make prediction\n",
    "    Returns\n",
    "      dJdw: (array_like Shape (n,1)) The gradient of the cost w.r.t. the parameters w. \n",
    "      dJdb: (scalar)                 The gradient of the cost w.r.t. the parameter b. \n",
    "                                  \n",
    "    \"\"\"\n",
    "    m,n = X.shape\n",
    "    f_wb = predict_function(X, w, b) \n",
    "    err  = f_wb - y                 \n",
    "    dJdw = (1/m) * (X.T @ err)     \n",
    "    dJdb = (1/m) * np.sum(err)     \n",
    "        \n",
    "    return dJdb,dJdw"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a name='dataset'></a>\n",
    "##  Dataset\n",
    "Below, we use an `SkLearn` tool to create 3 'blobs' of data. This is a quick and easy way to create data to classify. Using NumPy's [`np.unique`](https://numpy.org/doc/stable/reference/generated/numpy.unique.html), we can look at the number and values of the classes.\n",
    "- **Note** we're creating 3 classes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# make 3-class dataset for classification\n",
    "centers = [[-5, 0], [0, 4.5], [5, -1]]\n",
    "X_train, y_train = make_blobs(n_samples=500, centers=centers, cluster_std=0.85,random_state=40)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_mc_data(X_train,y_train,[\"blob one\", \"blob two\", \"blob three\"], legend=True)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# show classes in data set\n",
    "print(f\"unique classes {np.unique(y_train)}\")\n",
    "# show how classes are represented\n",
    "print(f\"unique classes {y_train[:10]}\")\n",
    "# show shapes of our dataset\n",
    "print(f\"shape of X_train: {X_train.shape}, shape of y_train: {y_train.shape}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When you create the all-vs one training set, for each class, you will need to create a 'binary' training set from `y_train`. This is a set with values set to `1` for all examples in that class. NumPy has a feature which makes this convenient. Shown below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_cat_2 = (y_train == 2) + 0\n",
    "print(y_cat_2[:10])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You will need binary (0,1) values. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_cat_2 = (y_train == 2).astype(float)\n",
    "print(y_cat_2[:10])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a name='ova'></a>\n",
    "##  One Vs All Implementation\n",
    "\n",
    "You will implement the OVA algorithm in three steps.\n",
    "- create and train three 'models'. Each trained to select one of the three classes.\n",
    "- create a routine `predict_mc` that will use the models to make predictions and select the best answer.\n",
    "- plot the decision boundary using the prediction routine.\n",
    "\n",
    "### Step 1: Create and Train 3 models.\n",
    "The steps involved will be familiar from past labs utilizing gradient descent.  \n",
    "For each class:\n",
    "- separate the target, `y` associated with the current class. \n",
    "- create `w_init` and `b_init`, initial values for the parameters. \n",
    "- call gradient descent. alpha=1e-2 and num_iters=1000 works well. \n",
    "    - The call to gradient_descent has a number of arguments. The routine is above. Double check your solution with the code in *Hint*. Notice how gradient descent utilizes all of the routines you have developed thus far.\n",
    "    - you will call `predict_logistic_matrix` to perform the prediction, this does Not contain the threshold logic. You trian without threshold and later, when using the model add a threshold.\n",
    "    - This returns parameters $w$ and $b$. $w$ and $b$ constitute your *model* which you will store in an arrays. The array, importantly, has *one column for each model*. This arrangement will allow the use of matrix operations and will become familiar in future labs.\n",
    "- call predict with the training data and your model ($w$,$b$) to plot the results of training. \n",
    "\n",
    "Below there is a for loop over each of the classes. \n",
    "- creates an target array with the current class set to one and all others set to zero.\n",
    "- your code\n",
    "- plots this interpretation of the data\n",
    "- plots the predicted values\n",
    "Replace `None` with your code."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<details>\n",
    "  <summary><font size=\"2\" color=\"darkgreen\"><b>Hints</b></font></summary>\n",
    "\n",
    "```python\n",
    "classes=np.unique(y_train)   # three classes, [0,1,2]\n",
    "m,n = X_train.shape          # number of examples, number of features\n",
    "c = len(classes)             # number of classe\n",
    "\n",
    "# storage for our models (w), one column per class\n",
    "W_models = np.zeros((n,len(classes)))   \n",
    "b_models = np.zeros(c)\n",
    "plt.figure(figsize=(14, 14))             \n",
    "\n",
    "for i in classes:\n",
    "    yc = (y_train == classes[i]).astype(float)\n",
    "    yc = yc.reshape(-1,1)  \n",
    "\n",
    "    ### START CODE HERE ### \n",
    "    w_init = np.zeros((2,1))                                                               \n",
    "    b_init = 0.                                                                            \n",
    "    w_final, b_final,_,_ = gradient_descent(X_train, yc, w_init, b_init,                   \n",
    "                                      compute_cost_logistic_matrix,                       \n",
    "                                      compute_gradient_logistic_matrix,                  \n",
    "                                      predict_logistic_matrix,                            \n",
    "                                      alpha = 1e-2, num_iters=1000)                         \n",
    "    W_models[:,i] = w_final[:,0]                                                          \n",
    "    b_models[i] = b_final                                                                 \n",
    "    pred =  predict_thresh(X_train, w_final,b_final )                                    \n",
    "\n",
    "    ### END CODE HERE ###         \n",
    "\n",
    "    #Left Side, training data in All vs i\n",
    "    ax = plt.subplot(3,2, 2*i + 1)\n",
    "    plot_mc_data(X_train, yc,legend=True); plt.title(f\"Training Classes, class {i}\"); \n",
    "\n",
    "    #Right Side, model i's prediction after training\n",
    "    ax = plt.subplot(3,2, 2*i + 2)\n",
    "    plot_mc_data(X_train,pred,legend=True); plt.title(\"Predicted Classes after training\");\n",
    "plt.show\n",
    "```\n",
    "</details>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "classes=np.unique(y_train)   # three classes, [0,1,2]\n",
    "m,n = X_train.shape          # number of examples, number of features\n",
    "c = len(classes)             # number of classe\n",
    "\n",
    "# storage for our models (w), one column per class\n",
    "W_models = np.zeros((n,len(classes)))   \n",
    "b_models = np.zeros(c)\n",
    "plt.figure(figsize=(14, 14))             \n",
    "\n",
    "for i in classes:\n",
    "    yc = (y_train == classes[i]).astype(float)\n",
    "    yc = yc.reshape(-1,1)  \n",
    "\n",
    "    ### START CODE HERE ### \n",
    "    ### BEGIN SOLUTION ###\n",
    "    w_init = np.zeros((2,1))                                                              ##None \n",
    "    b_init = 0.                                                                           ##None \n",
    "    # call gradient descent, double check your solution with Hint\n",
    "    w_final, b_final,_,_ = gradient_descent(X_train, yc, w_init, b_init,                  ##None \n",
    "                                      compute_cost_logistic_matrix,                       \n",
    "                                      compute_gradient_logistic_matrix,                    \n",
    "                                      predict_logistic_matrix,                             \n",
    "                                      alpha = 1e-2, num_iters=1000)                        \n",
    "    ### END SOLUTION ### \n",
    "    ### END CODE HERE ###         \n",
    "    W_models[:,i] = w_final[:,0]                                                           \n",
    "    b_models[i] = b_final                                                                  \n",
    "    pred =  predict_thresh(X_train, w_final,b_final )                                      \n",
    "\n",
    "    #Left Side, training data in All vs i\n",
    "    ax = plt.subplot(3,2, 2*i + 1)\n",
    "    plot_mc_data(X_train, yc,legend=True); plt.title(f\"Training Classes, class {i}\"); \n",
    "\n",
    "    #Right Side, model i's prediction after training\n",
    "    ax = plt.subplot(3,2, 2*i + 2)\n",
    "    plot_mc_data(X_train,pred,legend=True); plt.title(\"Predicted Classes after training\");\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<details>\n",
    "<summary>\n",
    "    <b>**Expected Output**:</b>\n",
    "</summary>\n",
    "\n",
    " ![asdf](./figures/C1W3_trainvpredict.PNG)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have trained our 3 models we will write a routine to select the best prediction. Recall, the operation involves \n",
    "- making a prediction for each model\n",
    "- picking the largest prediction\n",
    "\n",
    "-Step 1: Given $X$ and matrices `W_model` and `b_models`, perform a prediction resulting in three predictions. This can be implemented in vectorized form as descibed pictorially below. This is not a trivial operation. It is worth spending the time to understand what this is doing. A for loop implementation can also be used.\n",
    "![pic](./figures/C1W3_mcpredict.PNG)  \n",
    "-Step 2: use `np.argmax(axis=1)` to return the **class** of the prediction with the highest value. Note that class is one of [0,1,2] and the index returned by `np.argmax` is, conveniently also [0,1,2]."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<details>\n",
    "  <summary><font size=\"2\" color=\"darkgreen\"><b>Hints</b></font></summary>\n",
    "\n",
    "```python\n",
    "def predict_mc(X,W,b, verbose = False):\n",
    "    \"\"\"\n",
    "    Computes n predictions and selects the best.\n",
    "    Args:\n",
    "      X : (array_like Shape (m,n)) feature values used in prediction.  \n",
    "      W : (array_like Shape (n,c)) Matrix of parameter. Each column represents 1  model\n",
    "      b : (array_like Shape (c, )) vector of bias parameter. Each column represents 1  model\n",
    "    Returns\n",
    "      sclass: (array_like Shape (m,1)) The selected class the values belong in. Values 0 to c.\n",
    "    \"\"\"\n",
    "    ### START CODE HERE ### \n",
    "    ### BEGIN SOLUTION ###  \n",
    "    z_wb = X @ W + b               #Matrix multiply and add  ##None\n",
    "    f_wb = sigmoid(z_wb)              #sigmoid                  ##None\n",
    "    pclass = f_wb.argmax(axis=1)      #argmax                   ##None\n",
    "    ### END SOLUTION ###  \n",
    "    ### END CODE HERE ### \n",
    "    if verbose: print(\"z_wb.shape\",z_wb.shape); print(z_wb)\n",
    "    if verbose: print(\"pclass\",pclass)\n",
    "    return(pclass)\n",
    "```\n",
    "</details>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def predict_mc(X,W,b, verbose = False):\n",
    "    \"\"\"\n",
    "    Computes n predictions and selects the best.\n",
    "    Args:\n",
    "      X : (array_like Shape (m,n)) feature values used in prediction.  \n",
    "      W : (array_like Shape (n,c)) Matrix of parameter. Each column represents 1  model\n",
    "      b : (array_like Shape (c, )) vector of bias parameter. Each column represents 1  model\n",
    "    Returns\n",
    "      sclass: (array_like Shape (m,1)) The selected class the values belong in. Values 0 to c.\n",
    "    \"\"\"\n",
    "    ### START CODE HERE ### \n",
    "    ### BEGIN SOLUTION ###  \n",
    "    #Matrix multiply and add\n",
    "    z_wb = X @ W + b  \n",
    "    #sigmoid  \n",
    "    f_wb = sigmoid(z_wb) \n",
    "    #argmax\n",
    "    pclass = f_wb.argmax(axis=1)      \n",
    "    ### END SOLUTION ###  \n",
    "    ### END CODE HERE ### \n",
    "    if verbose: print(\"z_wb.shape\",z_wb.shape); print(z_wb)\n",
    "    if verbose: print(\"pclass\",pclass)\n",
    "    return(pclass)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Test your model\n",
    "tmp_X = np.array([[-2.,-6.],[6,0],[-2,6]])                            #(2,2)\n",
    "tmp_w = np.array([[-1.117, 0.103, 0.963], [-0.863, 1.155, -0.954]])   #(2,3)\n",
    "tmp_b = np.array([-0.267 -1.4577 -0.238])                             #(3, )\n",
    "print(tmp_X.shape, tmp_w.shape, tmp_b.shape)\n",
    "\n",
    "tmp_fw = predict_mc(tmp_X,tmp_w,tmp_b, verbose = True)\n",
    "print(tmp_fw)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<details>\n",
    "<summary>\n",
    "    <b>**Expected Output**:</b>\n",
    "</summary>\n",
    "\n",
    "```\n",
    "(3, 2) (2, 3) (1,)\n",
    "z_wb.shape (3, 3)\n",
    "[[ 5.4493 -9.0987  1.8353]\n",
    " [-8.6647 -1.3447  3.8153]\n",
    " [-4.9067  4.7613 -9.6127]]\n",
    "pclass [0 2 1]\n",
    "[0 2 1]\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we can make a prediction for any point, you can now produce a plot with the decision boundary shown. `plot_mc_decision_boundary` utilizes your `predict_mc` to function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#plot the decison boundary. Pass in our models - the w's and b's assocated with each model and predict_mc\n",
    "plot_mc_decision_boundary(X_train,3, W_models, b_models, predict_mc)\n",
    "plt.title(\"model decision boundary vs original training data\")\n",
    "\n",
    "#add the original data to the decison boundary\n",
    "plot_mc_data(X_train,y_train,[\"blob one\", \"blob two\", \"blob three\"], legend=True)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<details>\n",
    "<summary>\n",
    "    <b>**Expected Output**:</b>\n",
    "</summary>\n",
    "\n",
    "![sdf](./figures/C1W3_boundary.PNG)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There you are! You have now build a Multi-Class classifier.\n",
    "\n",
    "Lets try another case. We'll just move the blobs around a bit:\n",
    "## Second Test Case"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# make 3-class dataset for classification\n",
    "centers = [[-5, 0], [0, 1], [5, -1]]\n",
    "X_train, y_train = make_blobs(n_samples=500, centers=centers, cluster_std=1.2,random_state=40)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_mc_data(X_train,y_train,[\"blob one\", \"blob two\", \"blob three\"], legend=True)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# show classes in data set\n",
    "print(f\"unique classes {np.unique(y_train)}\")\n",
    "# show shapes of our dataset\n",
    "print(f\"shape of X_train: {X_train.shape}, shape of y_train: {y_train.shape}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Examaning the plot above, do you see any potential issues with our current approach?\n",
    "\n",
    "Piece together the pieces from above, or create subroutines to create a decision boundary diagram like the one in the first example."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<details>\n",
    "  <summary><font size=\"2\" color=\"darkgreen\"><b>Hints</b></font></summary>\n",
    "\n",
    "```python\n",
    "classes=np.unique(y_train)   # three classes, [0,1,2]\n",
    "m,n = X_train.shape          # number of examples, number of features\n",
    "c = len(classes)             # number of classe\n",
    "\n",
    "# storage for our models (w), one column per class\n",
    "W_models = np.zeros((n,len(classes)))   \n",
    "b_models = np.zeros(c)\n",
    "\n",
    "for i in classes:\n",
    "    yc = (y_train==classes[i]) + 0\n",
    "    yc = yc.reshape(-1,1)\n",
    "\n",
    "    w_init = np.zeros((2,1))   \n",
    "    b_init = 0.\n",
    "    w_final, b_final,_,_ = gradient_descent(X_train, yc, w_init, b_init,\n",
    "                                      compute_cost_logistic_matrix, \n",
    "                                      compute_gradient_logistic_matrix, \n",
    "                                      predict_logistic_matrix,\n",
    "                                      alpha = 1e-2, num_iters=1000)     \n",
    "    W_models[:,i] = w_final[:,0]\n",
    "    b_models[i] = b_final\n",
    "    pred =  predict_thresh(X_train, w_final,b_final ) \n",
    "\n",
    "plot_mc_decision_boundary(X_train,3, W_models, b_models, predict_mc)\n",
    "plt.title(\"model decision boundary vs original training data\")\n",
    "\n",
    "#add the original data to the decison boundary\n",
    "plot_mc_data(X_train,y_train,[\"blob one\", \"blob two\", \"blob three\"], legend=True)\n",
    "plt.show()\n",
    "```\n",
    "</details>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Rewrite code here\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<details>\n",
    "<summary>\n",
    "    <b>**Expected Output**:</b>\n",
    "</summary>\n",
    "\n",
    "![asdf](./figures/C1W3_example2.PNG)\n",
    "    \n",
    "We will study logistic regression with polynomial features in the next lab. That will allow us to handle situations where purely linear solutions are not enough."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook was informed by an example at scikit-learn.org. The author was Tom Dupre la Tour <tom.dupre-la-tour@m4x.org>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
